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INTRODUCTION 


In estimating population gene frequencies a statistical complication arises 
when related individuals are included in the sample. This problem, recently 
discussed by Fisher (1940) in relation to recessive characteristics, can be 
referred to a fundamental consequence of heredity. Since related indi- 
viduals share, in part, the same genes—same in the sense of common origin 
as well as like in phenotypic effect—it follows that a given gene may be 
counted repeatedly if all individuals are treated as though unrelated. Conse- 
quently, an estimate of gene frequency derived from a sample of n related 
individuals will not have the same precision as one established upon » unre- 
lated cases, and the standard error will be underestimated if one employs 
the ordinary formula appropriate to a series of » independent observations. 

In statistical studies on the blood groups and other inherited characters 
it is probable that gene frequency estimates have been computed from time 
to time without regard to the loss of precision resulting from the inclusion 
of related material. In the extreme case where the 2 relatives are monozy- 
gous twins, it is likely that such pais would always be counted as a single 
individual, or, in the case of genes lacking dominance, as an observation of 
only 2 genes. On the other hand, pairs of parent and child, who must always 
share 1 gene at each locus, would customarily be counted as 2 individuals or 
4 genes, rather than 3 or some smaller number. 

The importance of this loss of information due to relationship will be 
appreciated in connection with tests of significance. Let us suppose that an 
investigator, having prepared antisera for testing the M and N agglutinogens, 
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finds, among the first 4 families tested, 1 family having both parents and 
4 children of the genotype MM and 8 families of the sort (MM x MN) w ith 
a total of 6 MM and 6 MN children. When all individuals are counted as 
unrelated, the proportion of N genes is 9/48. If it is now desired to compare 
this with the proportion 40/100, found by another investigator in a sample 
of 50 unrelated persons, the value of chi-square is y*= 7.26. The difference 
in gene frequency would, therefore, be judged significant at the 0.01 proba- 
bility level, and one might suspect a true differentiation of the populations 
or possible errors in serological procedure. Actually, however, when the 
true ratio is approximately 1:2:1, the finding of 15 MM, 9 MN, and 0 NN 
is not especially improbable if these individuals are contained in only 4 
families. For, when the children of these families are discounted on the 
ground that they possess no genes not already counted in the parents, the 
proportion of N genes is taken as 3/16, a value which, being based on only 
16 genes, does not differ significantly from 40/100 (x? = 2.67). If the second 
sample had also contained relatives, the significance of the difference would 
have been exaggerated even more by a failure to consider relationships. 

The method just used, namely that of counting only the parents, is suit- 
able only for whole families tested for genes lacking dominance. It cannot 
be applied without sacrificing information when one or neither parent has 
been tested, even in the simplest case of 2 genes without dominance. In the 
ease of genes showing dominance it is in no case appropriate, since, even 
when both parents are recorded, the children may provide further informa- 
tion about the parental genotypes. Fisher (1940) has outlined 2 general 
procedures for determining the proportion of recessives, or of recessive 
genes, when the data contain relatives. Both of his scoring systems are 
derived by the maximum likelihood method of estimation and are wholly 
efficient for all types of relationship. At present, however, numerical tables 
of the scores and weights have been made available only for dominant charac- 
teristics in pairs of parent and child, half sibs, and full sibs. 

It is the purpose of the present paper to examine a simpler procedure, 
namely that of assigning a fixed weight to each gene or individual after all 
individuals have been counted as though unrelated, the weight varying only 
with the type of relationship exhibited. The method is therefore similar to 
Fisher’s ‘‘fixed weight’’ system, in that each related group is given a score 
anda weight. In the present system, however, the score bears the same ratio 
to the weight as is shown by the unweighted numbers of genes or individuals 
in a direct enumeration, whereas no such simple proportionality exists 
between the likelihood scores and weights. This feature obviates the neces- 
sity of tabulating scores, thereby greatly reducing the number of numerical 
tables required, especially for large families. It further simplifies caleula- 
tion by permitting the pooling of unweighted frequencies for all families or 
units of a given type, irrespective of their phenotypic composition. 

This method, although always consistent, is generally inefficient and will, 
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therefore, yield standard errors somewhat larger than those which could be 
obtained by means of likelihood scores and weights. The seriousness of this 
reduction in precision is naturally less in some situations than in others. In 
general, however, the direct weighting system should be regarded merely as 
a provisional method and should be discarded whenever numerical tables for 
a more efficient scoring system become available. Even then, the simpler 
method may be useful in forming a trial estimate in those cases in which the 
likelihood method requires an approximation procedure. An examination 
of the direct weighting method will also be of interest in showing to what 
extent the precision of gene ratios may be exaggerated when the same equa- 
tions of estimation are employed for both related and unrelated material. 


GENES WITHOUT DOMINANCE 


In the case of 2 alleles without dominance, as exemplified by the M—-N 
blood types, all 3 genotypes are phenotypically distinguishable. If a sample 
of & unrelated individuals contains a MM, b MN, and c NN, the proportion 
of N genes is taken as 
b+2c 2 (1) 


ar 
and the amount of information respecting q, being the reciprocal of the 
sampling variance, is 
: 2k 9 
ed) 2 
When the & individuals include relatives, the same equation of estimation 
(1) will provide a consistent estimate of the population gene proportion, 
even though all individuals are counted as unrelated. Owing to relationship, 
however, the amount of information is reduced to 
2 hai 2k 3 
Dea ies ay (3) 
where w is a fraction always less than unity. Having determined the value 
of w for a given type of relationship, one may apply it to both the numerator 
and denominator of (1), obtaining scores (wx) and weights (wy) which may 
be conveniently summed for various groups of related or unrelated persons. 
The combined estimate of g for any body of genetical data will therefore be 
_8{q-i(q)} _ 8(we) a 
— «S{t(q)} 8S (wy)’ 
and its estimated variance will be given by 
q(1—@) i 
V(q) Say) (5) 
If there are n sets of relatives of a given type of relationship, each set 
containing & individuals, the variance of q = «/2k may be written as 
al ee MN 
Via)=28 |x - se): 
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Hence, if P is the probability of a set containing # genes of the sort N, the 
expected variance is 


V (4) = zen (8(Pa*) - 8*(Pr)}, 


or, since q = S(Px) /2k, because of the property of consistency, 
1 
Dhak See f ay ee 272 
V(q) = Fan 1S (Pe ) —4k?q?}. (6) 


Thus, the information furnished by a single set of & relatives when all indi- 
viduals are counted as unrelated is 
" 4k? (7) 
i(q) = S(Pr?) — 4h2q2’ 
and the weight to be assigned each gene in such an enumeration is 
+a q(1—q) ___2kq(1-q) 8 
Wad) "oe > (Pat) 4 (8) 
The procedure may be illustrated first for the case of m unrelated indi- 
viduals (Table I). Here & = 1, and the probabilities of the three genotypes, 
assuming genetic equilibrium, are (1—q)*, 2qg(1-—q), and q? for MM, MN, 
and NN, respectively. 


TABLE I 
Genotype N Genes Total Genes Probability Pxt 
. Hy 2k wg 
MM 0 2 (d=¢)29 >| eee 
MN 1 2 2q(1-4q) 2q -2q° 
NN 2 2 ¢ 4q? 


2q + 2q? 


Equation (6) leads to the familiar formula for the variance of a proportion 
established upon 2” independent observations : 


2q + 2q?-4 
Vices _ q?_a(1-q). 


The information per individual in an unrelated sample is therefore 2/q(1-q), 
and the weight per gene is 7(q)q(1—q) /2k=1. 

We may now apply the same procedure to the case of parent-child pairs, 
the results being shown in Table IT. 


TABLE II 


Parent-Child 
Pair 


8 


MM, MM 
MM, MN 
MN, MN 
MN, NN 
NN, NN 


RwWDH OS 
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The variance of q determined from n pairs of parent and child is thus 
6¢+10q?-16q? 3q(1-q) 
V = = 
(4) 16n 8n 


and i(q) = 8/3q(1—q). Hence, each gene counted in parent and child will 
be evaluated as 


Ww 


m9) 8 2 
Clee BE 
or the entire parent-child pair as 4w = 23 genes. 

Since every parent-child pair is certain to share only 1 gene at each locus, 
it may be wondered why such pairs do not tend to count as 3 genes instead 
of 23. The explanation is found in the oceurrence of (MN, MN) pairs. For 
all other pairs the exact number of M and N genes can be specified; for 
example, a pair of the sort (MN, NN) contains definitely 1 M and 2 N genes. 
In (MN, MN) pairs, however, it is only certain that at least 1 M and 1 N 
gene is represented; the third gene, although certainly present, cannot be 
identified. Hence, such pairs, comprising an expected fraction q(1—q) of 
all pairs, are logically worth only 2 genes, while the remainder are worth 3. 
This suggests that an estimate of g could be derived from parent-child pairs 
for which the quantity of information would be 

1 3 9( 129) 
igtl=g) 24 {1—9(1—q)} lycra) ela 
and such is, in fact, the amount of information which would be afforded by 
maximum likelihood scores when applied to parent-child pairs in the case of 
genes lacking dominance. Such scores, being fully efficient, utilize the whole 
of the information inherent in the data, which is defined (Fisher, 1935) by 


1-5 (B( 


where P, as before, stands for the probability of any class. The determina- 
tion of this quantity for parent-child pairs is shown in Table III. 


TABLE III 

Parent-Child a(z oti 

Pair E aP/aq P \aq q(1-4@) 
MM, MM (1-9)* ~3(1-q)? 9q—-18q2+ 99° 
MM, MN 2q(1—q)? 2(1—q) (1-39) 2—14q + 30g*— 18" 
MN, MN q(1—q) 1-2q ie 
MN, NN 2g2(1-q) 2q(2— 3a) 8q — 24g? +18¢ 
NN, NN (0 8q@° 9g?- 9q° 

pe ae GP 


3—q(1-q) 
EQ rg (aq) 
The relative efficiency of the direct weighting system for genes without 
dominance in parent-child pairs is therefore the ratio, 
‘ee ee ea 
B=70q)"3qd-q)° al-aq) 9-39(1-4) 
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This quantity is plotted in Figure 1, which shows that the direct method has 
a maximum efficiency of 96.97 per cent when the 2 genes M and N are equally 
frequent and that the efficiency falls to 8/9 as either gene becomes very rare. 


Families of s Children—Neither Parent Tested 


Probably the most common unit of genetical study in human investiga- 
tions is the single family, that is, a sibship having neither, one, or both 
parents tested. Table IV lists the various compositions of families of s chil- 
dren tested with respect to 2 allelic genes, M and N. It also gives the proba- 
- bilities and unweighted scores, x, the latter being simply the numbers of N 
genes per family when all genes are counted as though unrelated. For 
families having neither parent tested, the table shows that the uncorrected 
sum of squares is 
8S (Pa?) = 4q(1— 9)? G)°X{C.'u?} 

44g? (1-9)? E (Cw (4)°"(4) "(wt 2v)2} 

+ 2q?(1—q)?s? +493 (1-—q) ($)*X{C.3 (s + w)?} + 4q4s?, 
where > stands for summation by integers from u = 0 to w=s and from v =0 
to v = s, and where 
‘ s! ete s! 

Cut (s—u) lu! oe (s—u—v) lulvl- 


This reduces to 
S (Pa?) = q(1-q)*s(s +1) +.9?(1— q)?(4s? + 28) + 2g? (1- q)?s? 
+ q@(1—4q) (9s? +s) + 4q*s? 
= gs(s+1) + q?s(3s—-1). 
Since the total number of genes per sibship is 2k = 2s, 
S (Px?) —4k?q? = g(1—-@)s(s+1); 


jhe il 
hence V (a) = ec (8(Px*) — 4heqt) = en , 
F 4s 
“(q) Sqi=q)G@el) 
, dhe 2 
re eter) ee -—: (9) 


The weight per gene in sibships of s without recorded parents is thus a 
function of s alone, being independent of the parametric gene proportion, gq. 
This will be found to hold true generally for the simple weighting system 
when applied to data on genes lacking dominance, although the same is not 
true for the likelihood scores and weights. When s = 1, the weight (9) be- 
comes w = 1, since one is then dealing with single unrelated individuals. For 
sib-pairs, s = 2 and w = 3, which is the same as the weight found for parent- 
child pairs. However, the information furnished by sib-pairs is, in this 
instance, less than that found for parent-child pairs, and amounts to 

18 + 15q — 3q? — 24q° + 12q* | 
1(9) = 3g -@) @-@) 1 +4-@) 
The efficiency of the simple scoring method for sib-pairs is therefore 
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pu h@ _ sl+g- Pf) Gee 

I(q) 18+15q — 3q? — 24q* + 12q* 
a quantity, which, though generally greater than the corresponding value for 
parent-child pairs, takes the same limit (8/9) as q approaches 0 or 1 (Fig. 1). 
Numerical values of the weight (9) are given in Table VIII. 


Families of s Children—One Parent Tested 


For families having 1 parent and s children recorded, the uncorrected 
sum of squares is found from Table IV to be 
S (Px?) =2q(1-q)*($)*X{C.'u?} + 2¢(1—9)*(4)*2{ Cu" (w+ 1)?} +? (1—-@)?s? 
+? (1-q)?(s+2)?+4q?(1—q)?X {Cu (4)*"(4)"(u + 2v+ 1)?} 
+ 2g (1—q) ($)*X {Cus (s+u+1)?} +29°(1—@) (4)*2 (Cu? (s+ u+2)*} 
+ 4q*(s+1)? 
= $9(1-q)*s(s+1) +3¢(1—@)*(s+1) (s +4) 
+@°(1—q)s+q?(1—q)?(s+2)? 
+ 2q?(1—q)?(2s? +5842) +49°(1-—q) (9s? + 18s + 4) 
+4q°(1—q) (9s? + 25s +16) +49*(s +1)? 
=q(s+1)(s+2) +q?(s+1) (8s+2). 
Here k =s+1; hence, 
S (Px?) —4k?q? = q(1—q) (s +1) (s +2), 


“ 4(s+1) 
1G) = Teg) (84.8) 
and eae (10) 
Ibe 


The weight for a family with 1 parent and s children is thus identical 
with the weight for a sibship of (s+1) members without recorded parents. 
A single tested parent counts the same as an additional sib (ef. Table VIII). 


Families of s Children—Both Parents Tested 


For families having both parents recorded, Table IV shows that 
S (Pa?) = 4q(1- @)*($)*X {Cu (u+1)?} 
+ 49° (1-9)? X{Cuo?(4)°"(4) "(w+ 2v +2)?} + 2q2(1—q)?(s +2)? 
+ 4q°(1—q) ($)°2{Cu? (s+ ut3)?} + q*(2s +4)? 
= q(1—q)*(s+1) (s +4) +2q?(1—q)?(2s? + 9s +8) 
+ 2q°(1—q)?(s+2)?+9°(1—q) (9s? + 87s + 36) 
+ g*(2s+4)?. 
The number of genes observed per family is 2k = 2s + 4, so that 
S(Pa*) — 4k’? = q(1—q)(s+1)(s+4), 
: 4(s+2)? 
QD) =F -a) G+) ra) 
2(s+2) 
eae CES CE e 


and 
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The weight to be assigned the entire family when both parents and s 
children are counted is 

4(s+2)? 

1 (G51) G4). 

Now, when s = 0, this is equal to 4, since the ‘‘family’’ then consists of 2 unre- 
lated parents. For all positive values of s, however, the weight per family 
is less than 4, being minimum when there are 2 recorded children. For 
example, with s = 1, 2, 3, 4, 5, 6, one has 2kw = 3.6000, 3.5556, 3.5714, 3.6000, 
3.6296, 3.6571, etc., tending to 4 as s increases indefinitely. 

One observes the somewhat curious fact that the inclusion of the children 
of 2 unrelated parents reduces the precision of the gene frequency estimate 
provided by the parents alone, but this appears entirely logical when it is 
remembered that the genotypes of the parents are completely specified by 
their phenotypes in the case of genes lacking dominance. The children, 
therefore, add nothing to the precision of the estimate, but actually detract 
from it, owing to chance deviations between the ratios of M and N genes in 
the 2 generations, such deviation tending to disappear only when an infinite 
sample of the parents’ gametes has been obtained. When the parental geno- 
types are known, the probabilities of the various assortments of children are 
no longer functions of the population gene ratio, so that the maximum likeli- 
hood weight is always 4. To ignore the children in such families would be 
to substitute an efficient method, and there would seem to be no objection to 
such a departure from the usual procedure since the direct weighting system 
will necessarily have a varying efficiency for different groups of relatives. 


2khw 


Pairs of Relatives of any Kind or Degree 


Genetical data may occasionally contain pairs of relatives other than full 
sibs or parent and child, such as half sibs, aunt-nephew, first cousins, and 
so on. For the weighting of such pairs one may now consider the infor- 
mation supplied by any pair of relatives whose coefficient of relationship 
(Wright, 1921) is r, and who are both members of a regular pedigree, i.€., 
one lacking inbreeding. 

Let wo, Wi, V1, W2 be the probabilities that an individual will share neither, 
the one, the other, or both genes at a given locus in a relative of degree 1. 
Then, for any kind of relationship in regular pedigrees, i+ 2 =". The 
probabilities of the various genotype combinations in pairs of relatives may 
now be expressed as follows: 


(MM, MM) = (1—4)?{2+21(1—¢) + bo(1-4@)”} 

(MM, MN) = 4¢(1-@)*{¥i+¥o(1—4@)} 

(MM, NN) =2(1-—4) oq? (12) 
(MN, MN) = 2¢(1-4@) {Wot i+ 2og(1- 4) } 

(MN, NN) = 49?(1—4) {1 + Yoa} 

(NN, NN) = 47{tho + 219 + oa?} 
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Multiplying these successively by the squares of the unweighted numbers of 
N genes, which are «? = 0, 1, 4, 4, 9, 16, respectively, and summing, 
: S (Px) = q(Biyo + 12s + 4V0) — 92 (Bipo + 201 + 120). 
Putting 2+ 21+ o=1, and subtracting 4k?q? = 169°, gives 
§ (Px?) — 4h?q? = 4q(1—) (1+ Wa + ve) = 4g(1-9) (141). 


1 : aye Clea a 
Hence V(q@) = Tien {S (Px?) —4k?q?} = Fe , 
4 
(Qala) er) 
and w : (13) 


lar: 
For full sibs and parent-child pairs (7 = $) the weight per gene is w = 2/3, 
as noted above; for monozygous twins (r=1), w=1/2; for half sibs (r = 4), 
w = 4/5; for first cousins (r= 4), w = 8/9; ete. 
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Fia. 1. Efficiency of the direct weighting of sib-pairs and parent-child pairs in 
estimating the gene ratio for characters showing dominance (T_) or lacking dominance 
(MN). 


GENES WITH DOMINANCE 


In the case of two alleles showing dominance, such as the ‘‘secretor’’ and 
‘“‘nonsecretor’’ genes in man, it is more convenient to estimate the proportion 
of recessives q*, from which the gene frequency q can then readily be ob- 
tained. If a sample of k unrelated individuals contains a dominants (T_) 
and b recessives (tt), the square of the recessive gene proportion is esti- 
mated as 


b 
2—_y, 14 
G +5. (14) 


and the amount of information respecting q? is 


1(q?) = (15) 
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As in the case of genes lacking dominance, one may determine, for any group 
of relatives, a weight 

1g = FEE) 4(9"), (16) 

which, in this case, represents the weight per individual, so that the combined 


estimate derived from various groups of related or unrelated individuals 
will be 


S(wb) 
ete baled hp V7 
q S(wk) 7) 
with variance 
2 a = z) 
ee eee 18 
The amount of information respecting the gene proportion, q, is 
dq o, 48 (wk) . 
i(a) = (GE) (ae) - SOD; (19) 
hence q will be estimated as 
af Aes (20) 
S(wk) 


with variance 
GG ee 21 
VON esters ou 
By analogy with (7) and (8), the information and weight per individual 
in sets of k relatives of a given class may be written : 


se? k? } 
1(q?) = S(Px®) — hq" (22) 
and yp es. (23) 


S (Px?) —k?q* 
Families of s Children—Neither Parent Tested 


The probabilities of the various compositions of families of s children 
may be obtained from Table IV by combining MM and MN to represent 
dominants and NN reeessives. Multiplying these probabilities by the squares 
of the corresponding numbers of recessives, x”, one obtains, for the case of 
sibships without tested parents: 

8 (Px?) = 4q?(1— q)?X{0,°(2)*°(4) 7} + 49° (1 — g) (3) °2 (Cu? } + gts? 
= 4q?(1—q)?s(s+3) +9°(1—q)s(s+1) + g's? 
S (Px?) — k?q* = 8 (Px?) — qs? = q?(1—q¢)s{s+3+94(38s+1)}; 
4s 


hence, 1(q?) = @(1—q) {s+34+4(35+1)} 
eo 4(1 9) (24) 
and -$434-9(35 41) 


When s =1, the weight becomes w =1, as expected, but when s = 2, the 
weight is always a function of the unknown gene proportion q, so that practi- 
eal computation requires a process of interpolation with trial estimates, as is 
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also true in the use of likelihood scores and weights. For sib-pairs (s = 2), 
the information utilized by the direct weighting system is 
y= {ae , 
q*(1—@) (97) 
whereas the total information available is, as shown by Fisher (1940), 
reas = 21 — 5q — 8q? + 4q° 
q@?(1—9@) (8+ 4) (4+ 49-39? - ¢ sy 
hence, the efficiency is, in this case, 
gu hP) _ 8(8+¢) (4449 —3¢°—G") 
T(q?) (5+7q) (21—5q - 8q? + 4q*) 
This quantity is plotted in Figure 1, where it is seen that a maximum of 100 
per cent efficiency is reached when q = 0.6288 or when the percentage of reces- 
sives is 39.54. At this gene ratio the weight assigned to each individual in 
the sib-pair, 


4+ 4q° (25) 
5 7er 
becomes identical with that in the likelihood scoring system. 
Numerical values of the weight (24) are given in Table IX for sibships 
containing 2 to 15 members. 


Families of s Children—One Parent Tested 
For families with 1 parent and s children tested, one obtains from Table 
IDWe 
S (Px?) = 49° (1—q)?X {02° (2)*""(4) v7} 
+@?(1—q)? +29°(1—q) (3)°*2{C.'u?} 
+ 29°(1—q) ($)*2L{Cu* (w+ 1)?} + g*(s +1)? 
= 39°(1—q)*s(s+3) +q?(1—9)?+39°(1—q)s(s+1) 
+4$q°?(1—q)(s+1)(s+4)+ q*(s4+1)?. 
Subtracting k?q* = q*(s+1)?, one has 
S (Px?) — k?q* = 4q?(1- q) {s? +38 +44 ¢q(8s? + 9s+4)}, 


1(q?) = ibn (AY De eee 
g(1—9){(1+38q)s(s+3) +4(1+4q)} 
4(s+1)(1+q) 
J de ES WS a 26 
ay ”=(148q)8(s+3) +4(1+q) e) 
For parent-child pairs (s = 1) this reduces to 
itd | (27) 


1+2q 
which differs from the weight for sib-pairs (25). In fact, the weight for a 
single parent and s children (26) is generally greater than that for a sibship 
of (s+1) without recorded parents (24), although the 2 weights tend to 
become identical as g approaches unity (compare Tables IX and X). This 
is unlike the situation found for the case of genes without dominance. 


The information which parent-child pairs are capable of supplying is 
given by Fisher (1940) ; 
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Eg?) = 8 + 9q — 12q? + 4q° 

4q?(1-¢)(1+2q) (1+q-@)’ 
whence the efficiency of using direct scores and weights for such pairs is 
_4(q?)_ ———-8(1+q-@?) 
~—1(@?)) 8+9q—1292 +43 
This quantity (Fig. 1) reaches 100 per cent at the gene proportions, g =0 
and q = 4%, at which points the parent and child count as 2w =2 or 1.5 indi- 
viduals, respectively. 


1, 


Families of s Children—Both Parents Tested 


For families having both parents and s children recorded, Table IV oives 

8 (Pat) = 2q2(1—q)? + 49"(1—9)?E {Ce (2)-°(4)*) 
4g (Ll — Gls) D5( Ca (we 1)*}  g*(s-62)2 
= 2q?(1—q)? + 49° (1—@)?s(s +8) +9 (1—q) (s+1) (s+4) +9%(s+2)?, 
S (Px?) —k?q* = S( Px?) — q*(s +2)? = 4q?(1—q) {s? + 88+. 8+ ¢q(3s?+17s+8)}, 

i(@) = a | 
qg?(1-4q) {s?+3s+8+¢q(3s?+17s +8) } 

: 4(s+2)(1+q) (28) 

s?+38+8+q(3s?+17s +8) 
Numerical values of this weight are given in Table XI. Multiplying it 

by (s+2), one has the weight for the entire family, 

4(s42)2(1+q) 
ee re de ea ee 

When s=0, this reduces to 2; however, with s=1, the weight is not 
always equal to or greater than 2. At the limit for rare recessive genes 
(q— 0), the family tends to count as 4(s+2)?/(s?+3s+8) individuals; 
putting s = 1, 2, 3, 4, 5,..., this becomes 3.0000, 3.5556, 3.8462, 4.0000, 4.0833, 
..., tending to 4 after reaching a maximum of 4.1739 at s=10. At the other 
extreme, where dominants are very rare (¢q—1), the weight per family 
becomes 2(s +2)?/(s+1)(s+4). This is equal to one-half the weight for 
complete families tested for genes lacking dominance (11), so that the last 
row in Table XI duplicates the last column in Table VIII. It will be recalled 
that, in the case of genes lacking dominance, the inclusion of the children of 
2 tested parents always reduces the weight below that for the parents alone. 
In the case of genes showing dominance, one would generally expect the chil- 
dren to provide additional information concerning the parental genotypes, 
except at the extreme (q¢ > 1), at which nearly all families are of the sort 
(ttxtt). The likelihood weight would therefore always be 2 or greater. In 
the direct weighting method, however, the weight falls below 2 at a point 
short of g = 1, where the additional information supplied by the children fails 
to compensate for the inefficient method of scoring them. This point varies 
with s, as is shown in Table XI, but, in general, the children will not reduce 
the precision of the estimate unless the recessive gene frequency exceeds 60 
per cent, in which case it would again be better to abandon the direct weight- 


ing method and count only the parents. 
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In the limit for very large families the weight (s+2)w tends to 
4(1+¢q)/(1+3q), whereas the weight provided by Fisher’s likelihood scoring 
method tends to (6—3q + q?)(1+q)/(1+3q).* Consequently, the efficiency 
of the direct weighting method in this case approaches 4/(6 — 3q + 9°), vary- 
ing from % to 1 as q varies from 0 to 1. 


Pairs of Relatives of any Kind or Degree 


Combining the probabilities given in (12), one obtains the following 
probabilities for the 3 possible combinations of dominants and recessives in 
pairs of relatives of any kind or degree: 

(T_, T_) = o(1-q?)? + 2. (1-29? + 9°) + 2(1—- 9”) 
(T_, tt) =2q?(1-¢) (1— Wet Wog) (29) 
(tt, tt) =? (Wet 21g + Woq?) 
Multiplying these successively by 0, 1, 4, and summing, 
S (Px?) = 2q?(1+ eo + 2ig + Wog?). 
Putting *& = 2, one then has 


1 
mV (gq?) = 73S (Pa?) — keqt} = 3q°(1 + Wo + 2d + oa") — a" 


And pon) | Se eee (30) 
k nV(q?) 142+ 2Wig +hoqg? — 2q? 

Here w cannot be reduced by means of the relation i+ W2=7, as was 
possible in the case of genes lacking dominance. It has already been seen 
that the weights for sib-pairs (25) and parent-child pairs (27) differ in the 
ease of genes showing dominance, even though the coefficient of relationship 
is in both eases r= 4. These 2 types of relatives are, however, the simplest 
examples of 2 fundamental classes of relationship, ‘‘unilineal’’ and ‘‘bi- 
lineal’’ (Cotterman, 1941), for each of which one may obtain a general 
formula for w in terms of 7. For unilineal relatives, who possess no 2 paths 
of descent which are completely different in all their links, one has y= 0, 
Yi=1, and Yo = 1-27; while for symmetrical bilineal relatives, who possess 
2 independent paths containing equal numbers of links, one has 2 = 7r?, 
Yi=r(1—r), and o= (1-7)*. Making these substitutions in (30), one has 
_ A+q 
—1+¢q+2r¢q 
7 1+4q 
~14+q+2rq+r?(1-q) 

Thus, when r= 1, we have w, = 4, which is the weight per individual in 
the case of monozygous twins; when r= 4, w; gives the weight previously 
found for parent and child (27), and w, gives the weight for sib-pairs (25) ; 
when r= }, one has w, = (2+2q)/(2+3q) for half sibs, grandparent-child, 
or aunt-nephew, ete. ; and w, = (4+4q)/(3+5q) for double first cousins ; and 
so on. 


* The weight given by Fisher (1940, p. 166) as (6-3q +42) (1-q)/(1+3q) is evi- 
dently a typographical error. 


WwW, for unilineal relatives, (31) 


We for bilineal relatives. (32) 
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NUMERICAL EXAMPLES 


Table V illustrates the application of the direct weighting method to data 
on a pair of genetic factors lacking dominance. The record consists of the 
M-—N blood types of all subjects examined at the University of Michigan 
Heredity Clinie prior to December 31, 1942, exclusive of related groups 
larger than a single family. The data are arranged according to the type of 
family, i.e., to the numbers of tested parents and children. The frequencies 
of MM, MN, and NN are then pooled for families of each type, and the num- 
bers of N genes and total genes are counted without regard to relationship. 
The weights to be assigned each gene in the aggregate for each type of 
family are obtained from Table VIII. 


TABLE V 
ESTIMATION OF GENE RATIO FoR GENES LACKING DoMINANCE (M-—N Buoop TyPss) 


Number Number & - Weight 
Parents Children pape EN Eee eae Ore 0) per Gene 
Tested s a b c x y w 
Unrelated 23 54 20 94 194 1.000000 
2 at 20 31 18 67 138 -600000 
2 2 25 42 13 68 160 444444 
2 3 17 26 7 40 100 8571438 
2 4 0 5 iu if 12 .300000 
2 5 2 8 4 16 28 -259259 
2 6 5 3 0 3 16 .228571 
2 8 4 5 I 7 20 .185185 
i a} 4 10 2 14 32 .666667 
1 2 8 16 6 28 60 500000 
i 3 a 6 a 8 16 .400000 
1 5 4 2 0 2 12 .285714 
0 2 3 15 8 31 52 .666667 
0 3 1 0 2 4 6 -500000 
0 ee 0 4 0 4 8 -400000 
Ot alg tae teense: Ile 227 83 393 854 


S (wa) = 238.3095, S (wy) = 503.8740 


The estimated proportion of N genes is therefore 
S(wx) 238.3095 

q= eon) - 503.8740 > 47.295 per cent, 

to which is attached a variance of 
0.47295 x 0.52705 

V (4) = 593.8740 

or a standard error of 2.224 per cent. 
Had all individuals in the record been treated as unrelated, without 
weighting, one would have obtained: q = 393/854 = 46.019 per cent, with 
standard error \/ (0.46019 x 0.53981) /854 = 1.706 per cent. This estimate, 
which is likewise consistent, agrees well with the correctly weighted propor- 
tion, but its precision has been falsely exaggerated, the standard error being 


only 76.7 per cent of the appropriate value. 


= 0.0004947036, 
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Since it has been shown that the direct weighting of all individuals con- 
tained in families with 2 recorded parents is less efficient than the use of the 
parents alone, the above weighted estimate may be slightly improved by 
omitting the children of such families. When this is done (Table VI), the 


TABLE VI 


ESTIMATION OF GENE RATIO FoR GENES LACKING DOMINANCE (M-—N Btoop TyPEs), 
EXCLUDING CHILDREN OF Two TESTED PARENTS 


Number | Number + + Ape \\ ai Weight 
Parents | Children | MM | MN | NN | 042em | (04040) te Ge 
Tested Ss a b c x y w 
Unrelated 23 54 20 94 194 1.000000 
2 0 31 61 22 105 228 1.000000 
i 1 4 10 2 ae > 32 -666667 
1 2 8 16 6 28 60 -500000 
ab 3 ih 6 i 8 16 .400000 
1 5 4 2 0 2 12 .285714 
0 2 3 15 8 31 52 -666667 
0 3 1 0 2 + 6 .500000 
0 + 0 + 0 4 8 -400000 
Motalyiserecs crt 75 168 61 290 608 


S (wa) = 250.3714, S(wy) = 524.0286 


estimate is 


250.3714 
Ge 594.0986 ~ 47.778 per cent, 
with standard error 
gOl=7y, i 
ee = 2.182 per cent. 


The estimated number of genes sampled is now 524.0286. If likelihood scores 
and weights had been used for all families, this number would have been 
slightly larger, with a resulting slight reduction in the standard error. To 
use an inefficient scoring system therefore slightly minimizes the significance 
of a deviation in gene ratio, whereas the use of no weights at all may greatly 
exaggerate the significance. 

By consolidating the frequencies of genotypes MM and MN in Table V, 
a record is obtained which will serve to illustrate the application of direct 
weights to data on a dominant factor. The results shown in Table VII are, 
therefore, those which would have been obtained if only an anti-M serum had 
been used. Owing to the fact that the weights for families tested with 
respect to dominant genes are functions of the unknown gene proportion, q, 
one must approximate the correctly weighted estimate, using the weights 
found in Tables IX, X and XJ at the two trial estimates, q = 0.45 and q = 0.50. 
Mba weights appropriate for q? = 20.25 and 25.00 per cent recessives, one 

nds 


57.30 56.50 
Q= = 0.20349 d We aedloe tt 
ane 0 "97734 


= 0.20372, 


THE ESTIMATION OF GENE FREQUENCIES if 


TABLE VII 


ESTIMATION OF GENE RATIO FoR GENES WITH DOMINANCE (M-—WN Buioop Typzs, 
WwitH MM Anp MN Genotyrrs AssumrEpD INDISTINGUISHABLE) 


Number Number 


Parents Children. MM +MN NN a+b= Weight, w, Weight, w, 
Tested s a b k at q= .45 at q=.50 

Unrelated Wf 20 97 1.0000 1.0000 
2 1 51 18 69 Oe 6923 
2 2 67 13 80 5485 Oa00 
2 3 43 tf 50 4482 4348 
2 4 Is: il 6 3791 .8673 
2 5 10 4 14 .38285 3182 
2 6 8 0 8 .2898 .2807 
2 8 9 eli 10 .2346 2273 
1 1 14 2 16 .1632 -7500 
4 2 24 6 30 -0939 .5806 
all 3 Ml dk 8 4823 4706 
zi 5 6 0 6 .3487 .3396 
0 2 18 8 26 sf lalye .7059 
0 3 1 2 3 0524 5455 
0 4 4 0 4 4514 4444 
PD Ota Sie etiees 344 83 427 


At q=.45: S(wb) =57.30, S(wk) = 281.58; 
At q=.50: S(wb) =56.50, S(wk) = 277.34. 


respectively. The former is 0.00099 above the trial estimate of 0.2025, while 
the latter is 0.04628 below 0.2500. The linear interpolate is 


, 0.00099 
q? = 0.2025 + 0.0475 x 0.04737 = 0.203495 
and the interpolated value for S(wy) is 
0.00099 
281.58 — 4.24 x 0.04797 ~ 281.49. 


For the estimate of the gene proportion, g, one may now take 
q = \/ 0.203495 = 45.11 per cent, 
with a variance of 
1-q@ 0.79651 

UNG Te apy) 4281-49 
or a standard error of 2.66 per cent. By comparing this with the estimate 
q = 47.778 + 2.182 per cent, it is seen that precision has been lost by trans- 
forming the data into dominants and recessives, as expected. The 2 esti- 
mates, however, do not differ significantly from one another, which testifies 
to the internal consistency of the data and acceptable fit of the genetic 


hypothesis. 


= 0.0007074 


SUMMARY 


When all members of a family or other related group are counted as 
though unrelated, a fixed weight may be assigned each gene or individual 
in the ageregate, and the usual equations for estimating the population gene 
ratio may then be applied to the cumulated weights. The appropriate 
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weights for genes with and without dominance have been tabulated for 
families of various sizes, with neither, one or both parents recorded, and for 
pairs of relatives of any kind. This method of estimation, although simpli- 
fying calculation and permitting of an easier extension to complex groups 
of relatives, is generally inefficient and should be employed only when scores 
and weights for the maximum likelihood solution are not available. 
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TABLES OF WEIGHTS 


TABLE VIII 


GENES WITHOUT DOMINANCE. WEIGHTS PER GENE FOR SIBSHIPS OF 3, 
WITH NEITHER, ONE, OR BOTH PARENTS TESTED 


a. Neither Parent One Parent Both Parents 
ee of Tested Tested Tested 
uildren, 2 PY 2(s+2) 
s w= ww =—_ v= ————— 
stl $+2 (s+1)(s+4) 
Oe ge cote 1.000000 1.000000 
1 1.000000 -666667 -600000 
2 666667 500000 444444 
3 500000 400000 357143 
4 400000 333333 300000 
5 333333 285714 -259259 
6 285714 -250000 228571 
7 250000 222222 -204545 
8 222222 -200000 185185 
9 .200000 -181818 169231 
10 181818 166667 155844 
11 166667 153846 144444 
12 153846 142857 134615 
13 142857 133333 -126050 
14 133333 125000 -118519 


15 -125000 117647 -111842 
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